Skip to content

Conversation

@chaitanyakolluru
Copy link

@chaitanyakolluru chaitanyakolluru commented Oct 1, 2025

What this PR does / why we need it:
We do not have a way to control the initial rollout of HelmReleaseProxies for every matching cluster as part of a HelmChartProxy. This change introduces a new field to HelmChartProxy spec rolloutStepSize, which can be used to stagger the initial rollout of HelmReleaseProxies based on already existing HelmReleaseProxies's ready status.

This change will allow for a controlled rollout strategy that comes into place only during initial rollout and will not impact other options for ReconcileStrategy. A new status condition HelmReleaseProxiesRolloutCompleted is added to denote initial rollout status of HelmReleaseProxies and is included when determining overall summary Ready status condition on HelmChartProxy resources. For HelmChartProxies not using the rollout option, HelmReleaseProxiesRolloutCompleted status condition is added to the resource with a reason of HelmReleaseProxiesRolloutUndefined and a message that mentions not using the rollout option.

Below are two HelmChartProxy resource status conditions, one with and the other without rollout as an option:

  • Without rolloutStepSize:
[
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "HelmReleaseProxiesReady"
  },
  {
    "lastTransitionTime": "xxx",
    "message": "HelmChartProxy does not use rollout step",
    "reason": "HelmReleaseProxiesRolloutUndefined",
    "severity": "Info",
    "status": "True",
    "type": "HelmReleaseProxiesRolloutCompleted"
  },
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "HelmReleaseProxySpecsUpToDate"
  }
]
  • With rolloutStepSize:
[
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "HelmReleaseProxiesReady"
  },
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "HelmReleaseProxiesRolloutCompleted"
  },
  {
    "lastTransitionTime": "xxx",
    "status": "True",
    "type": "HelmReleaseProxySpecsUpToDate"
  }
]

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 1, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: chaitanyakolluru / name: Chaitanya Kolluru (8d0c3a3)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 1, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @chaitanyakolluru!

It looks like this is your first PR to kubernetes-sigs/cluster-api-addon-provider-helm 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-addon-provider-helm has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 1, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @chaitanyakolluru. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 1, 2025
@mboersma
Copy link
Contributor

mboersma commented Oct 1, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 1, 2025
Signed-off-by: Chaitanya Kolluru <[email protected]>
@chaitanyakolluru chaitanyakolluru force-pushed the helm-chart-proxy-rollout branch from 9bd4698 to d72d7f1 Compare October 1, 2025 15:57
c.Status.Conditions = conditions
}

func (c *HelmChartProxy) GetHelmReleaseProxyReadyCondition() *clusterv1.Condition {
Copy link
Member

@dmvolod dmvolod Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of function already implemented in Cluster API conditions package. Please use it instead of create.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the fn and used Get() from cluster-api/util/conditions. Thanks.

…er-api's condition package.

Signed-off-by: Chaitanya Kolluru <[email protected]>
if err != nil {
return err
}
for _, hrpRltMeta := range clusterRolloutMeta {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the objects in the list are in a non-deterministic order, and processing them in batches during different reconciliation cycles may lead to unpredictable results. Are you sure this will be handled correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced orderliness on rollout meta before processing to prevent inconsistencies with different reconciliations over a list of matching clusters. Thanks for bringing this up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good

hrpReadyCond := conditions.Get(helmChartProxy, addonsv1alpha1.HelmReleaseProxiesReadyCondition)
// If HelmReleaseProxiesReadyCondition is false, reconcile existing
// HelmReleaseProxies and exit.
if hrpReadyCond != nil && (hrpReadyCond.Status == corev1.ConditionFalse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you utilize conditions.IsTrue() or conditions.IsFalse() here and later in the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used conditions.IsFalse() when checking for false status on the condition. I don't see other instance where conditions.IsTrue() can be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Comment on lines 274 to 286
if hrpRltMeta.hrpExists {
continue
}

// Don't reconcile if the Cluster is being deleted
if !hrpRltMeta.cluster.DeletionTimestamp.IsZero() {
continue
}

err := r.reconcileForCluster(ctx, helmChartProxy, hrpRltMeta.cluster)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are very similar code to the code in lines 244-255, except condition in hrpRltMeta.hrpExists. You can wrap it with function (probably anonymous) and use it here and upper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved cluster deletion timestamp check to within the reconcileForCluster() to make those sections look cleaner. Let me know if this works for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good

// creating HelmReleaseProxy resources for all matching clusters.
// e.g. an int (5) or percentage of count of total matching clusters (25%)
// +optional
RolloutStepSize *intstr.IntOrString `json:"rolloutStepSize,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible and currently processed changes to this field during recomcile, i.e. reset condition in case of reduce or increase it value?

Copy link
Author

@chaitanyakolluru chaitanyakolluru Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the list of matched clusters per the clusterSelector field in the spec. Once all HelmReleaseProxies have been spawned during initial rollout, condition HelmReleaseProxiesRolloutCompletedCondition is marked True and upon changes to either the clusterSelector or the list of clusters being managed, the next reconcile will notice the difference in count of clusters matched to the HelmReleaseProxies, sets condition HelmReleaseProxiesRolloutCompletedCondition to False and rolls them out according to the rolloutStepSize defined. Hope that answers the question.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chaitanyakolluru, dmvolod
Once this PR has been reviewed and has the lgtm label, please assign mboersma for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chaitanyakolluru
Copy link
Author

Could I get some eyes on this MR please. @mboersma @jackfrancis.


// reconcileForCluster will create or update a HelmReleaseProxy for the given cluster.
func (r *HelmChartProxyReconciler) reconcileForCluster(ctx context.Context, helmChartProxy *addonsv1alpha1.HelmChartProxy, cluster clusterv1.Cluster) error {
// Don't reconcile if the Cluster is being deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this change in this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change as part of minimizing code reuse and moved the cluster deletion timestamp check to within the method instead.

#443 (comment)

return nil
}

// getNnStringFor to retrieve the namespaced name as a string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer the more verbose getNamespacedNameStringFor

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated name to getNamespacedNameStringFor

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 16, 2025
@chaitanyakolluru
Copy link
Author

@dmvolod @mboersma @jackfrancis made a couple of updates to how it decides to rollout the next batch of HelmReleaseProxies. Please take a look when you get a chance. Thanks.

@k8s-ci-robot
Copy link
Contributor

@chaitanyakolluru: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-addon-provider-helm-e2e 8d0c3a3 link true /test pull-cluster-api-addon-provider-helm-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@chaitanyakolluru
Copy link
Author

Closed in favor of #452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants